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1. Summary 

Education is associated with better health. Flowever, it is still unclear 
whether it has a causal impact. Natural experiment studies allow better 
assessment of causality. Allocation to selective secondary schooling, 
where the length of education is longer, was often based on achieving a 
certain level on a test score. This allows a regression discontinuity design 
that makes use of the fact that those either side of the cut-point on the 
test score are similar apart from the type of secondary school they are 
allocated to. Flere we propose to use data from a 1950s Aberdeen birth 
cohort for whom test score, secondary school attended and later life 
health are available to test the impact of secondary schooling on health. 


2. Introduction 
2.1 Background 

Education has long been regarded as important for adult health and 
health inequalities [1]. Given that policy influences the amount, quality, 
and distribution of education, it is important to study to what degree 
education impacts health. 

One difficulty is that controlled experimental studies for long-term 
outcomes like adult health are impractical, and this means we rely on 
observational studies. Flowever, because background socio-economic 
characteristics are key drivers of education, confounding is a major issue 
in observational studies that is difficult to account for.[2,3] Because those 
attending different types of school will differ it may be their background 
rather than schooling that affects adult health. 

Natural experiments provide a way around this by controlling for 
confounding more through design rather than statistical control. 
FHistorically, state selective schools (grammar and senior secondary in 
England and Scotland, respectively) chose students based on scores from 
tests given at the end of primary schooling, around age 11. This opens up 
the possibility of a natural experiment, as those with similar test scores 
who fell one side or the other of the cut-off are likely to have similar 
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background characteristics, but attended different schools as if 
randomised. 


Such a situation is ideal for a regression discontinuity study. Previous 
research has used this method with the Aberdeen Children of the 1950s 
cohort to assess selective schooling's impact on economic outcomes [4, 
5]. It showed that selective schooling increased the length of education, 
the likelihood of getting higher qualifications, and the likelihood of having 
a professional occupation. However, selective schooling only slightly 
increased income for women and did not increase income for men [4, 5]. 

The Aberdeen Children of the 1950s cohort is a large population with 
extensive early-life records of schooling and socio-demographics, and 
follow-up later in life. This allows evaluation of the natural experiment in 
schooling assignment, and here we propose to analyse the effect of this 
assignment on long-term health. 


2.2 Rationale 

This study fits closely with the remit of Administrative Data Centre- 
Scotland (ADRC-S) as it makes uses of a historical cohort linked to 
contemporary administrative data to explore a key social determinant of 
health. It also fits with the aims of the inequalities and policy programmes 
of the Medical Research Council/Chief Scientists Office Social and Public 
Health Sciences Unit, University of Glasgow as these have a focus on 
using natural experiments to understand policy driven social determinants 
of health and health inequalities. 


2.3 Aims/Objectives/Research questions 

1) Is attending selective schools associated with a different risk of poor 
health in later life compared to attending a non-selective school? 

2) Does the association with selective schooling vary by childhood 
socio-economic background? 

3 Study Design/Methods 

3.1 Study Design 

Population: 12,150 members of the Aberdeen Children of the 1950s 
(ACONF) cohort [6]. All were born between 1950-1956 in the city of 
Aberdeen and attended primary school there in the 1960s. (see section 
3.2) 

Data: Anonymized linkages will be made between the following previously 
collected data sets: 

1) data from age 6-12 - cognitive test scores; verbal and mathematics 
tests scores; teachers’ assessments; family sociodemographic 
information (collected as part of ACONF study) 
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2) data from mid-life - responses to a survey of physical and mental 
health, education history, and sociodemographic characteristics 
(collected as part of ACONF study) 

3) current data - diagnoses from hospital admission and mental health 
inpatient databases from Scottish Medical Records (SMR01 and 
SMR04); all-cause mortality from National Records Scotland 

Exposure: allocation to selective vs non-selective state secondary school 
by test score at age 11/12. 

Students were assigned to secondary school as follows: those with a test 
score of below 540 were assigned to a non-selective school, and those 
with a test score of 580 or higher were assigned to a selective school. 
Students who scored 560 to 579 were assigned to a selective school 
provided one of their IQ scores was 112 or more and the head teacher 
said they were suitable. Students scoring 540 to 559 and those remaining 
from the 560 to 579 bracket were allocated to remaining selective school 
places by an appeals committee. Secondary school attended was 
surveyed in 2001 as part of an extensive postal questionnaire. 

Outcomes: 

1) physical and mental health in middle-age 

2) current physical and mental health 

3) all-cause mortality 

Two-thirds of cohort members answered a questionnaire in 2001 when 
they were in their mid-forties. The survey asked members to rate their 
overall health, and to complete a 4-item version of the General Heath 
Questionnaire (GHQ-4) to rate mental health. Self-rated health is a simple 
but powerful measure of general health [7]. 

We will also calculate a count of morbidities from SMR01 and SMR04 
using ICD-10 diagnosis code algorithms for chronic conditions [8,9]. 
Morbidity scores will be calculated for the period of 1997-2001 to evaluate 
health at the time of the survey, and for 2011-2015 to evaluate current 
health. All-cause mortality will be gathered from the death registry from 
National Records Scotland. 

Covariates 

We also use the following variables from the ACONF database: sex, age, 
school grade in 1962, IQ test scores at 7 and 9, father's social class at 
birth and in 1962, 1961 neighbourhood-level information on home 
ownership, crowding, and amenities (hot water, cold water, fixed bath, 
and toilet). 


Methods: This study will use a regression discontinuity design. 

In such a design, allocation to the exposure depends on the value of a 
continuous variable. In our case, entry to a selective state school 
depended on the scores of an entrance test taken at around age 11. The 
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design utilises the fact that if there was no treatment there would be a 
smooth relationship between the continuous variable and the outcome. 

For example, we know that there is a smooth negative association 
between cognitive ability tests in childhood and adverse adult health 
outcomes [10]. 

Because the exposure allocation is based on achieving a certain test 
score, a discontinuity (or jump) in the smooth relationship is introduced if 
the exposure (selective schooling) has an effect. If allocation to exposure 
is only dependent on the admissions score and not any other 
characteristics of the individual, then at either side of the discontinuity 
individuals will be similar , so that any jump in the outcome can be 
causally attributed to the exposure. 

Ours is not a sharp discontinuity (when all those scoring above one value 
are exposed). Although there was a minimum score below which the 
probability of attending a selective school was effectively zero and an 
upper score after which the probability of attending was nearly 1, there 
was an intermediate band where exposure probability increased at certain 
scores and so there is a fuzzy discontinuity here. Our analysis will use 
graphical methods to test assumptions and to present results before using 
regression techniques to formally model any effect. This is outlined in 
section 3.8. 


3.2 Settings 

The Aberdeen Children of the 1950s (ACONF) cohort study follows 12,150 
individuals born in the city of Aberdeen between 1950 and 1956. The 
research on these individuals began as the Aberdeen Child Development 
Survey (ACDS), undertaken by the Medical Research Council Medical 
Sociology Unit in Aberdeen. The ACDS studied all children in primary 
school in Aberdeen in December 1962 between the ages of six and 12 
years. It collected detailed information including: records of routine 
cognitive test scores conducted at ages 7, 9, and 11 years; scores on 
tests conducted at the time of the 1962 survey; teachers' assessments; 
and sociodemographic information from the 1961 census. 

In 1999 the project was continued as ACONF when a grant was obtained 
from the Medical Research Council to follow up the children from ACDS. 
Permission was given by the Scottish MultiCentre Research Ethics 
Committee and the Privacy Advisory Committee to trace study members 
through the General Register Office Scotland and to flag all traced and 
surviving study members through the National Health Service Central 
Register. Two-thirds of members also completed an extensive postal 
survey in 2001. 

3.3 Participant Selection 

The whole of the cohort are eligible for our experiment and so we request 
access to the entire population. 
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3.4 Recruitment 


All participants are already recruited. We will make no contact with them, 
and will use only previously collected data. 

3.5 Withdrawal and loss to follow up 

Multiple imputation and/or inverse probability weighting will be used to 
minimise bias caused by loss to follow-up caused by failure to respond to 
the survey, emigration or non-linkage to mortality or morbidity records. 

3.6 Study Procedures 

Once appropriate approvals have been secured, linkage between the 
cohort study data and the administrative medical data described above 
will be done by eDRIS as part of the National Safe Haven. They will then 
provide us with the de-identified linked dataset for use in this study, (see 
sections 4.2-4.4) 

3.7 Data Collection 

No new data will be collected. 


3.8 Data Analysis 

We will use existing procedures to conform to best analytical practice in 
health-based regression discontinuity designs [11,12] . 

In the first stage we will use graphical analysis to check assumptions of 
the regression discontinuity design and to graphically show the results. 
We will conduct analysis for the whole sample and also by sex, primary 
school grade, and social class background if numbers are sufficient. 

1) To confirm the discontinuity in treatment we will graph the test 
score against exposure probability. We will calculate probability of 
exposure by 5-point "bins" of the test score, for example 535 to 
539, 540 to 544 etc., but will widen these if there are not sufficient 
numbers to 10-point bins. 

2) To check that there is no bunching of test scores around the cut 
points we will produce a histogram of the test scores. If there is 
bunching this would indicate some manipulation of the system and 
thus potential bias to the results. 

3) To check that there is no discontinuity in the distribution of 
background confounders at the cut points for selective school 
allocation we will graph potential confounders by the test score. 
Within the 540 to 579 test score range where the test score is not 
the only factor in allocation, we will also look at confounder 
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distribution amongst those who were allocated to a selective school 
and those not. 

4) We will graph each outcome by test score. If we see a discontinuity 
(jump) at the cut points this will suggest an effect of exposure. 
Within the 540 to 579 test score range where the test score is not 
the only factor in allocation, we will also look at outcomes for those 
who were allocated to a selective school and those not. 

In the second stage we will use regression models to formally estimate 
the exposure effect. 

We expect a smooth relationship between the test score variable and the 
outcome, with higher test scores being associated with better health. If 
there is an effect of selective schooling then this should introduce a 
discontinuity. This implies a basic model of the outcome modelled as a 
function of the continuous test score and a dummy variable for treatment 
based on the cut point in the test score. 

It is usual practice to test for a non-linear relationship between the 
outcome and the test score in order to test whether any treatment effect 
at the discontinuity really is just a reflection of an underlying non-linear 
relationship. So we will test second, third and fourth order polynomials. 
The plot of the outcome in the graphs from stage one allows a strong 
visual check for overfitting that can occur when using high order 
polynomials. 

We will also check whether the underlying relationship between test score 
and the outcome varies either side of the cut points by including an 
interaction between treatment and test score. The regression can be 
estimated over the full range of the test score, but this can introduce bias 
if the functional form of the relationship between the outcome and the 
test score is not modelled correctly. So in addition to assessing whether 
there is non-linear relationship, it is important to check the effect of 
varying the bandwidth (the range of test scores included either side of the 
cut point for treatment) on the treatment effect estimates. We will 
explore a technique that attempts to find the optimal bandwidth but will 
vary the bandwidth as a sensitivity test. 

A complicating factor in our analysis is that in the 540 to 579 range and 
especially the 540 to 559 range, test score increases the probability but 
does not determined that the student attends a selective school. This is 
akin to non-compliance in a RCT. There are two analysis routes, one is 
akin to intention-to-treat and this would reflect that although those in the 
range 540 to 579 have a test score for attending a selective school, 
places were not available for all. This would be the same analysis as 
outlined previously with one exposure variable based on achieving a test 
score of 540 or above. The second, more robust, approach is akin to the 
treatment effect for compilers in a trial. To evaluate this we will use an 
instrumental variable approach where in the first stage probability of 
exposure is modelled as a function of test score and the three cut points 
and the interaction of test score and these cut points. In the second 
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stage, in the standard outcome model outlined previously, the treatment 
effect variable is replaced by the probability of exposure from the first 
stage. 

Finally because in the 540 to 579 range other variables than test score 
(like head teacher assessment) influence treatment allocation, the 
treatment effect in this range may be confounded by these other 
variables. Thus we will rerun the main outcome model excluding those 
with test scores between 540 and 579, a so-called donut design [13] 

Sensitivity Analysis 

The exposure variable (secondary school attended) and some outcomes 
come from the 2001 survey of the cohort, to which two-thirds of the 
original participants responded. However, non-responders may remain 
included in this study, as we have the admissions test score for the whole 
cohort, and it is possible to use multiple imputation to validly impute 
school attended for those missing this variable, and to test the impact on 
our results. This analysis will be supported by data from a resurvey in 
1964 when some were in secondary school and their school was recorded. 
In addition, previous analysis suggests that below a score of 540 the 
probability of selective schooling is zero and above 580 it is 1. This means 
that we can assume that anyone with these scores was definitely 
allocated to non-selective and selective schooling and run the appropriate 
analysis on the full cohort. 

The selection criteria for secondary schooling is reported as having 
changed for the youngest class (in grade 3 in 1962). We will evaluate 
this, in case there is a possibility that these participants could contribute 
to sensitivity analysis as they were still subject to school selection but by 
a different score. 

We will also explore the sensitivity of our results to include those 
attending private (fee paying) secondary schools, who have been shown 
to be similar to selective school attendees in the state sector. 

Power 

A published table of statistical power for regression discontinuity designs 
[14] suggests that to detect a small difference (0.2 of a standard 
deviation) between two means at the 2.5% level (one-tailed) with 95% 
power would require a sample size 1753, with 80% power a sample size 
of 1078. For a large effect size (0.8 of a standard deviation) the sample 
sizes were 109 and 68 respectively. 

In the previous regression discontinuity analysis using this cohort [4] the 
reported sample size for the main estimation with optimal bandwidth was 
4644 for men and women combined. Results suggest that regression 
discontinuity designs' sample size is around 2.7 times higher than the 
corresponding RCT [14]. For example, average self-rated health in the 
Aberdeen study at age 45 has been reported as 3.01 with a standard 
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deviation of 0.78 [15]. So for a small effect (0.2 of a standard deviation) 
at the 5% level with 80% power and taking account of the relative size of 
the exposure groups, the regression discontinuity sample size applying 
the correction is 2762, for a large effect (0.8 of a standard deviation) it is 
201 . 

This suggests that our sample is adequately powered to detect relatively 
small effects. However, intricacies of the design mean that the actual 
correction may be higher than 2.7. Given this we will assess our results in 
totality rather than focussing solely on the achieving or not of statistical 
significance at an arbitrary level. We will assess whether the magnitude of 
any effect size is plausible, for example whether it is in line with other 
studies. 


4. Research Governance and Regulatory Issues 
4.1 Ethical issues 

The Aberdeen Children of the 1950s study has received ethical and 
privacy advisory committee approval and is governed as outlined below. 

The cohort data is treated as a research platform and as such is 
registered as a Research Database through the North of Scotland 
Research Ethics Committee. Policy, driven by MRC guidelines, is to make 
anonymised data as widely available as possible, subject to all 
appropriate approvals and controls. The relevant section covering linkage 
to SMR: 'any applicant wishing to access linked SMR data on ACONF 
respondents has first to apply to the Steering Group for overall scientific 
approval, and if that is granted, then has to apply to the Privacy Advisory 
Committee (PAC) for permission to have specific linkages made, 
appropriate to the research question being addressed. The linkages would 
be made in liaison between the study manager and ISD (the Information 
and Statistics Division at NHS National Services Scotland), and only 
anonymised data subsets would be released to researchers'. 

The ACONF project holds approvals from the Scottish MultiCentre 
Research Ethics Committee, London School of Hygiene and Tropical 
Medicine ethical committee, Grampian Local Research Ethics Committee, 
National Research Ethics Service - North of Scotland Committee, and the 
Privacy Advisory Committee. 

We will seek approval of the ACONF Steering Group to use cohort data. 
This project will also go through the Administrative Data Research 
Network approvals process for data linkage, and because of the proposed 
data linkage to NHS data, through the Public Benefit and Privacy Panel for 
Health and Social Care. The University of Glasgow will be the sponsor and 
this will be confirmed by the signing of this protocol. We have 
confirmation that we do not require additional ethical permissions for this 
project. 
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4.2 Data Monitoring 

In addition to our team, data management will also be monitored by an 
independent research coordinator at the ADRC-S. A data specification will 
be prepared ahead of linkage. We will request metrics on linkage 
certainty, and assess the quality of the linked data by exploring 
distributions and assessing out of range/implausible values. 

4.3 Data Management 

Data linkage will be done and the results de-identified by eDRIS as part of 
the National Safe Haven. We will then have access the resulting dataset 
on a secure safe haven virtual desktop environment via the ADRC-S. Only 
named researchers, with appropriate information governance training, can 
use the safe haven. All outputs from the safe haven are screened for 
disclosure control by an independent research coordinator prior to 
release. 

As this is a secondary data project, and the datasets are curated, hosted 
and controlled by others, we have not completed a separate data 
management plan as this is covered by our applications to access the 
data and the conditions of those controlling the data. 

4.4 Data Storage and Retention 

The anonymized, linked dataset created for this project will be retained 
within the national safe haven for 10 years, per Medical Research Council 
guidelines. Individual-level data cannot be released, but to facilitate 
replication of our analysis, we will make available all data management 
and analysis code once cleared for release at publication. 

5 Project Management 


5.1 Project Manager 

The Project Manager with responsibility for the day to day management of 
the project is: Jessica Butler 

5.2 Project Management Group 


The Project Management Group will normally consist of the PI, Co- 
Investigators, Project Manager and Project Team members (e.g., staff 
employed on or contributing to the project). 

The Project Team consists of the following members: 

| Name | Division/Organisation 
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Name 

Division/Organisation 

Frank Popham 

MRC/ CSO SPHSU, University of 
Glasgow 

Corri Black 

Institute of Applied Health Sciences, 
University of Aberdeen 

Jessica Butler 

Institute of Applied Health Sciences, 
University of Aberdeen 

Peter Craig 

MRC/ CSO SPHSU, University of 
Glasgow 

Chris Dibben 

Geosciences, University of 

Edinburgh 

Ruth Dundas 

MRC/ CSO SPHSU, University of 
Glasgow 

Michele Hilton Boon 

MRC/ CSO SPHSU, University of 
Glasgow 

Marjorie Johnston 

Institute of Applied Health Sciences, 
University of Aberdeen 


The Project Management Group will meet monthly. 

Minutes of PMG meetings will be taken on the SPHSU template and a 
Decision Log will be created and maintained by the Project Manager. 


5.3 Advisory Group / Steering Committee 

This project will be carried out in consultation with the core team and 
research coordinators at the Administrative Data Research Centre 
Scotland. In addition we will seek input from senior social researchers at 
the Scottish Government. 

5.4 Project Filing Structure 

The electronic project files will be kept on: the University of Aberdeen 
server, with archiving at the University of Glasgow 

Paper project files will not be kept. 


6. Dissemination 

6.1 Communication method 


The key communications channels are: 

• Journal papers 

• Conferences - NHS Research Scotland Conference; Administrative 
Data Nework Conference; and Society for Social Medicine 
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• Aberdeen Children of the 1950s website, Facebook page, and annual 
study newsletter 

• Explorathon Scotland 

• Articles in The Conversation 

• SPHSU Twitter / Blog 


6.2 Publication Policy 

There will be one main results publication and a possible methods paper 
(aimed at social epidemiology journal where regression discontinuity is 
less well used). The results publication will be aimed at a high impact 
journal, exact journals to be discussed by the project team. All members 
of the project team who meet the venue's authorship standards will be 
authors, and the order is suggested as Butler first (as lead researcher 
with the expectation of drafting the paper), Popham last, and alphabetical 
in-between. 

All publications and presentations relating to the project will be authorised 
by the Project Management Group. 


6.3 Public Engagement and Knowledge Exchange 

The Aberdeen Children of the 1950s study has an active engagement 
programme. Every year, 5500 study members receive the study 
newsletter detailing research that has used the study's data. There is an 
active Facebook page ( www.facebook.com/aberdeenbirthcohorts ) where 
news on recent research and conferences is presented. All publications 
and infographics of their findings are posted on the Aberdeen Children of 
the 1950s study website ( www.abdn.ac.uk/birth-cohorts ). Researchers on 
the team also reach out to the public in person throughout the year. 
Recent events that described research and sought public input include: an 
Aberdeen Children of the 1950s reunion; presentations of research at the 
large-scale public events Explorathon'15 and Mayfest; talks at local social 
clubs (Rotary, etc); interviews with local and national press about the 
study; a short film about study research; and annual meetings with a 
focus group of study participants. 

The ADRC-S has a knowledge exchange strand and will seek their input in 
disseminating the findings to a policy audience as well as making direct 
contact with Scottish government education research. 


7. Project Milestones / Timelines 

The following sets out the key project milestones points when key 
decisions must be taken: 

Approvals applications (6 months), data preparation and analysis (6 
months) and write up and dissemination (6 months) 
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8. Project Risk Assessment 


The risks relevant to the project are recorded in the risk assessment form 
and contained in the initial Project Risk/Issue log on: the project folder 

The Risk Log will be reviewed and updated at Project Management Group 
meetings. 
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